Search CORE

11 research outputs found

ArAutoSenti: Automatic annotation and new tendencies for sentiment classification of Arabic messages

Author: Azouaou Faical
Chiclana Francisco
Guellil Imane
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/08/2020
Field of study

The file attached to this record is the author's final peer reviewed version.A corpus-based sentiment analysis approach for messages written in Arabic and its dialects is presented and implemented. The originality of this approach resides in the automation construction of the annotated sentiment corpus, which relies mainly on a sentiment lexicon that is also constructed automatically. For the classification step, shallow and deep classifiers are used with features being extracted applying word embedding models. For the validation of the constructed corpus, we proceed with a manual reviewing and it was found that 85.17% were correctly annotated. This approach is applied on the under-resourced Algerian dialect and the approach is tested on two external test corpora presented in the literature. The obtained results are very encouraging with an F1-score that is up to 88% (on the first test corpus) and up to 81% (on the second test corpus). These results respectively represent a 20% and a 6% improvement, respectively, when compared with existing work in the research literature

De Montfort University Open Research Archive

Edinburgh_UCL_Health@ SMM4H'22:From Glove to Flair for handling imbalanced healthcare corpora related to Adverse Drug Events, Change in medication and self-reporting vaccination

Author: Alex Beatrice
Guellil Imane
Sun Tony
Wu Honghan
Wu Jinge
Publication venue
Publication date: 01/10/2022
Field of study

This paper reports on the performance of Edin-burgh_UCL_Health’s models in the Social Media Mining for Health (SMM4H) 2022 shared tasks. Our team participated in the tasks related to the Identification of Adverse Drug Events (ADEs), the classification of change in medication (change-med) and the classification of selfreport of vaccination (self-vaccine). Our best performing models are based on DeepADEM-iner (with respective F1= 0.64, 0.62 and 0.39 for ADE identification), on a GloVe model trained on Twitter (with F1=0.11 for the changemed) and finally on a stack embedding including a layer of Glove embedding and two layers of Flair embedding (with F1= 0.77 for selfreport)

PubMed Central

UCL Discovery

Edinburgh Research Explorer

Detecting Adverse Drug Events from social media:A brief literature review

Author: Abboud Massi-Nissa
Alex Beatrice
Berrachedi Yousra
Chenni Nidhaleddine
Guellil Imane
Wu Honghan
Wu Jinge
Publication venue
Publication date: 07/12/2022
Field of study

Edinburgh Research Explorer

FLAP: A framework for linking free-text addresses to the Ordnance Survey Unique Property Reference Number database

Author: Alex Beatrice
Casey Arlene
Guellil Imane
Guthrie Bruce
MacRae Clare
Marwick Charis
Suarez Paniagua Victor
Wu Honghan
Zhang Huayu
Publication venue
Publication date: 28/11/2023
Field of study

Introduction: Linking free-text addresses to unique identifiers in a structural address database [the Ordnance Survey unique property reference number (UPRN) in the United Kingdom (UK)] is a necessary step for downstream geospatial analysis in many digital health systems, e.g., for identification of care home residents, understanding housing transitions in later life, and informing decision making on geographical health and social care resource distribution. However, there is a lack of open-source tools for this task with performance validated in a test data set.Methods: In this article, we propose a generalisable solution (A Framework for Linking free-text Addresses to Ordnance Survey UPRN database, FLAP) based on a machine learning–based matching classifier coupled with a fuzzy aligning algorithm for feature generation with better performance than existing tools. The framework is implemented in Python as an Open Source tool (available at Link). We tested the framework in a real-world scenario of linking individual’s (n = 771,588) addresses recorded as free text in the Community Health Index (CHI) of National Health Service (NHS) Tayside and NHS Fife to the Unique Property Reference Number database (UPRN DB).Results: We achieved an adjusted matching accuracy of 0.992 in a test data set randomly sampled (n = 3,876) from NHS Tayside and NHS Fife CHI addresses. FLAP showed robustness against input variations including typographical errors, alternative formats, and partially incorrect information. It has also improved usability compared to existing solutions allowing the use of a customised threshold of matching confidence and selection of top n candidate records. The use of machine learning also provides better adaptability of the tool to new data and enables continuous improvement.Discussion: In conclusion, we have developed a framework, FLAP, for linking free-text UK addresses to the UPRN DB with good performance and usability in a real-world task

Edinburgh Research Explorer

University of Dundee Online Publications

A semi-supervised approach for sentiment analysis of arab (ic+ izi) messages: Application to the algerian dialect

Author: Adeel Ahsan
Azouaou Faical
Benali Fodil
Dashtipour Kia
Gogate Mandar
Guellil Imane
Hachani Ala-Eddine
Hussain Amir
Ieracitano Cosimo
Kashani Reza
Publication venue: Springer
Publication date: 01/01/2021
Field of study

In this paper, we propose a semi-supervised approach for sentiment analysis of Arabic and its dialects. This approach is based on a sentiment corpus, constructed automatically and reviewed manually by Algerian dialect native speakers. This approach consists of constructing and applying a set of deep learning algorithms to classify the sentiment of Arabic messages as positive or negative. It was applied on Facebook messages written in Modern Standard Arabic (MSA) as well as in Algerian dialect (DALG, which is a low resourced-dialect, spoken by more than 40 million people) with both scripts Arabic and Arabizi. To handle Arabizi, we consider both options: transliteration (largely used in the research literature for handling Arabizi) and translation (never used in the research literature for handling Arabizi). For highlighting the effectiveness of a semi-supervised approach, we carried out different experiments using both corpora for the training (i.e. the corpus constructed automatically and the one that was reviewed manually). The experiments were done on many test corpora dedicated to MSA/DALG, which were proposed and evaluated in the research literature. Both classifiers are used, shallow and deep learning classifiers such as Random Forest (RF), Logistic Regression(LR) Convolutional Neural Network (CNN) and Long short-term memory (LSTM). These classifiers are combined with word embedding models such as Word2vec and fastText that were used for sentiment classification. Experimental results (F1 score up to 95% for intrinsic experiments and up to 89% for extrinsic experiments) showed that the proposed system outperforms the existing state-of-the-art methodologies (the best improvement is up to 25%)

Aston Publications Explorer

Repository@Napier

FLAP: a framework for linking free-text addresses to the Ordnance Survey Unique Property Reference Number database

Author: Arlene Casey
Beatrice Alex
Beatrice Alex
Beatrice Alex
Bruce Guthrie
Charis Marwick
Clare MacRae
Honghan Wu
Honghan Wu
Huayu Zhang
Imane Guellil
Víctor Suárez-Paniagua
Publication venue: Frontiers Media S.A.
Publication date: 01/11/2023
Field of study

IntroductionLinking free-text addresses to unique identifiers in a structural address database [the Ordnance Survey unique property reference number (UPRN) in the United Kingdom (UK)] is a necessary step for downstream geospatial analysis in many digital health systems, e.g., for identification of care home residents, understanding housing transitions in later life, and informing decision making on geographical health and social care resource distribution. However, there is a lack of open-source tools for this task with performance validated in a test data set.MethodsIn this article, we propose a generalisable solution (A Framework for Linking free-text Addresses to Ordnance Survey UPRN database, FLAP) based on a machine learning–based matching classifier coupled with a fuzzy aligning algorithm for feature generation with better performance than existing tools. The framework is implemented in Python as an Open Source tool (available at Link). We tested the framework in a real-world scenario of linking individual’s (n=771,588) addresses recorded as free text in the Community Health Index (CHI) of National Health Service (NHS) Tayside and NHS Fife to the Unique Property Reference Number database (UPRN DB).ResultsWe achieved an adjusted matching accuracy of 0.992 in a test data set randomly sampled (n=3,876) from NHS Tayside and NHS Fife CHI addresses. FLAP showed robustness against input variations including typographical errors, alternative formats, and partially incorrect information. It has also improved usability compared to existing solutions allowing the use of a customised threshold of matching confidence and selection of top n candidate records. The use of machine learning also provides better adaptability of the tool to new data and enables continuous improvement.DiscussionIn conclusion, we have developed a framework, FLAP, for linking free-text UK addresses to the UPRN DB with good performance and usability in a real-world task

Directory of Open Access Journals